Pressure vessel-oriented visual inspection method based on deep learning

The detection of surface parameters of pressure vessel welds guarantees safe operation. To address the problems of low efficiency and poor accuracy of traditional manual inspection methods, a method for welding morphological parameters combined with vision and structured light is proposed in this study. First, a feature point extraction algorithm for weld parameters based on deep convolution was proposed. An accurate extraction method of weld image feature point coordinates was designed based on the combination of the loss function via seam undercut feature recognition and weld feature point extraction network structure. Second, a training data enhancement method based on the third-order non-uniform rational B-spline (NURBS) curve was proposed to reduce the amount of data collection for training. Finally, a pressure vessel measurement device was designed, and the feature point extraction performance of the deep network and common feature point extraction networks, DeepLabCut and HR-net, proposed in this study were compared to analyze the theoretical accuracy of the surface parameter measurement. The results indicated that the theoretical accuracy of the parameter measurements was within 0.065 mm.


Introduction
The butt welds of pressure vessels A and B are important factors that affect stress. The weld surface consists of four parameters: weld width, reinforcement, undercut, and misalignment, as specified in the relevant standards. These parameters directly reflect the stress concentration at the welding position, and their measurement is an important for evaluation the welding quality [1,2]. Currently, relatively mature parameter measurement methods are manually completed using a magnifying glass, weld inspection ruler, angle ruler, and other tools. However, these methods exhibit low accuracy, low efficiency, large workload, and high labor intensity. Given the good characteristics of machine vision inspection methods, there have been various degrees of research and applications in welding process inspection or welding seam surface parameters.
Based on the different imaging light sources, machine vision weld inspection methods can be classified into two branches: passive and active visual weld inspection. The passive vision welding seam detection method involves imaging in a natural light environment, using using NURBS curve segmentation and fitting. The influence of image noise was significantly reduced via fitting because the derivative of the fitted curve was used as opposed to direct determination of the slope of the curve. However, the fitting curve differed from the curve, which in turn caused feature point position errors. The representative products of active visual weld inspection include the weld measuring instrument used for various groove welds devised by Servo-Robot of Canada [20], which can measure the reinforcement, width, and undercut, and the seam tracking system developed by the British MetaVision Company [21], which can monitor the longitudinal and circumferential seam welding process of welding robots. It is evident that when compared to the passive visual weld inspection method, the active visual weld detection method exhibits a better reflection of the image characteristics of the weld parameters and lower inspection environment requirements. This has become the mainstream method for weld visual tracking, identification, and inspection. However, the applicability of the current feature-point extraction algorithm should be improved. It is not possible for any algorithm to simultaneously detect the four parameters of weld width, reinforcement, undercut depth, and misalignment. In recent years, deep learning has advanced rapidly in machine vision research. The processing methods for extracting specified feature points from images are classified as pose estimation algorithms. The Deep-Pose network for image feature estimation was first proposed by Google in 2014 [22]. The front end of the network uses deep convolutional neural networks (CNN) to extract image feature information at multiple scales, and the back end uses multi-scale features of the convolutional layer. The output is connected to the fully connected layers (FC), and the coordinate extraction task of the feature points in the image coordinate system is completed. Owing to the limitation in CNN's feature extraction performance at the time, the Deep-Pose accuracy was low. Megvii Technology (2018) proposed a cascaded pyramid network (CPN) [23], which was classified into Global-Net and Refine-Net. Global-Net uses a pyramid structure to extract CNN to extract image multiscale features, and Refine-Net structure regresses on image feature points. To accurately estimate the coordinate information of feature points, the complex and inefficient FC layer in Deep-Pose was abandoned, and a simple and efficient regression method of the deconvolution layer and mean pooling layer structure was used. To date, this is the best network for feature point extraction. A simple and effective bottom-up structure of DeepLabCut has been proposed in the literature [24] for pose estimation of animals, and the network exhibits good migration performance and can be applied to feature point extraction of other objects. In [25], a residual step network structure was proposed to fuse features of the same spatial dimension (intra-level features) to further optimize the key point locations. To solve the problem of severe degradation of feature map resolution in the calculation of traditional CNN structures, [26] proposed an HR-net that can maintain a high-resolution feature map output during the convolution calculation. Later, [27] designed a bottom-up human pose estimation structure based on HR-net. Reference [28] proposed a TokenPose network structure, which replaces the image convolution calculation with the transformer structure in the NLP research field. The network feature extraction AP is close to the literature result, but the network structure is lighter, and it is only suitable for human pose estimation. Therefore, this study attempts to combine a deep learning image feature extraction method with active visual weld inspection technology to provide a new research idea for the field of weld inspection.
The main contributions of this study are as follows: 1. We analyzed the reasons due to which the standard welding seam surface parameter measurement method cannot be used for actual pressure pipeline inspections, and we proposed a measurement index for welding seam surface parameters to account for the coexistence of numerous surface parameters. -based image parameter feature point extraction  approach can simultaneously extract all parameter feature points in a single image. 3. We proposed a method based on the third-order NURBS curve for enhancing the image data of the weld surface profile of a pressure vessel, which can significantly reduce the number of deep network data collection tasks.

A deep convolutional neural network
The remainder of this paper is organized as follows. In the section on modeling and numerical analyses, the design details of a weld surface device based on a laser profile sensor are provided and an overview of the weld surface parameter calculation algorithms are outlined. Various experimental results are provided to confirm the validity of the proposed methods in the results section. Finally, conclusions are provided in the discussion section.

Modeling and numerical analyses
Weld surface parameters measuring device based on laser profile sensor A laser profile sensor, electric slide-in Z-axis, manual slide in Y-axis, and computer are shown in Fig 1 as a system for measuring the weld surface parameters based on a structured light model. A KEYENCE LJ-V7080 laser profile sensor with a 32-mm built-in camera and built-in laser with a wavelength of 405 nm were used in the experiment. The measuring ranges of the sensor in the Xand Y-axis directions were 20 mm and 46 mm, respectively. The sensor was installed directly above the cylinder for measurement, and its imaging distance was within this range. The oneline-shaped laser emitted by the sensor hit the surface of the pressure vessel to be measured. The camera inside the sensor captured images near the linear laser range in real time, and the point cloud data were output to the computer via an internal algorithm. The sensor was fixed on the Zaxis electric guide rail, and the line laser emission surface was parallel to the cross-section of the pressure vessel cylinder. The Z-axis electric slide rail was connected to the Y-axis manual slide rail via a bracket, allowing the sensor to move in both directions. The laser sensor and pressure pipe cylinder were kept tangential to the movement direction during the detection procedure.
The laser profile sensor-based welding seam surface profile parameter detection system described in this study is shown in Fig 2. The entire detection process is as follows: ① the laser profile sensor emits a laser that is perpendicular to the weld surface of the pressure vessel; ② point set S generated by the sensor is preprocessed to generate the weld contour curve image G; ③ deep learning network, and the characteristic points of the weld parameters are output at coordinates P (x, y); ④ according to the proposed numerical calculation index of the weld surface parameters, the numerical calculation of the parameters is completed.
In general, the weld contour curve image G is generated from the weld contour point set S, and this method requires that it can highlight and accurately reflect the surface contour characteristics of the weld. The common image-production method first creates an m×n×3 threechannel back matrix I back . Then, to generate image G, each contour point set was filled with a black background image in the form of monochrome pixels. Although the picture background (black background) and content (contour pixels) scales are extremely small, as low as 1/10000, this image-generating method can precisely depict the surface contour properties of the weld. The model can be over-fitted if the image is used for subsequent deep learning training. Therefore, after the black background image is created using the above method, the following formula is used to generate a contour curve image G with a large ratio between the image background and content. Let Pix be the RGB value of the contour point color and h�i be the calculation method of rounding down. Then, G can be expressed as follows:

Weld surface parameter measurement index with multiple defect parameters
The butt joint longitudinal and girth weld reinforcements, as well as three measurement surface parameters (width, undercut, and misalignment), were defined in accordance with the AWS A3.0 "Definition of Standard Welding Terms" [29], ISO 5817 "Welding Joints" [30], AWS D1.1 "Welding Specification for Steel Structures" [31], and ASME VIII "Boiler and Pressure Vessel Manufacturing Code" standards [32]. A schematic of the four-parameter measurement requirements for butt welds is shown in Fig 3. The parameter corresponding to weld width is defined as the distance between the two weld toes according to the AWS A3.0 "Definition of Standard Welding Terms," which is the junction between the weld surface and base metal. The weld reinforcement h re is a parameter in which the weld metal exceeds the height of the fillet welding groove. The weld parameter undercut h un_cut denotes the size of grooves or depressions produced along with the base metal of the weld toe owing to improper selection of welding parameters or incorrect operation methods. Fig 3(A) shows the definitions of weld width l width , weld reinforcement h re , and weld undercut h un_cut in the standard. Weld misalignment is defined by the ASME VIII "Boiler and Pressure Vessel Manufacturing Code" standard as the phenomenon of dislocation and unevenness due to the deformation of the welding deviation and other factors during welding. The parameter weld misalignment h align denotes the size and amplitude of the misalignment as shown in Fig 3(B).
The existence of weld defects alone, which is a measurement index under the ideal weld condition, is an example of each weld appearance parameter defined and detailed in the standard state. An initially formed weld can exhibit multiple coexisting defects. In the case of weld defects (such as wrong under the influence of edge volume), the definition of the welding seam parameter measurement index in the standard example diagram is no longer applicable. Hence, the appearance parameters of welds at the cross-sectional position of the welds under the conditions of normal welds without defects, single-defect welds, and multi-defect welds are discussed in this study with reference to measurement diagrams of weld appearance parameters in relevant standards.  coincide with the width feature points in the cases of no defect parameters and single misalignment defects. However, given that the weld toe on the undercut side disappears in the case of a defective undercut, the parameter width feature point is modified to the junction point of the undercut depression curve and welding curve. The misalignment feature point at this time is the intersection point between the undercut curve and base material curve as shown in Fig 4 (C) and 4(D), respectively. The parameter feature points in normal welds are extreme points or corner points, as shown in the above feature point selection example, and the traditional curve extreme and reciprocal analysis methods can complete the feature point extraction task. The characteristic points of the cross-section curve width are weak owing to undercut defects. It is difficult to simultaneously extract all of the characteristic points based on traditional curve analysis methods, and to date, no scholars have proposed a method for simultaneously extracting all of the appearance parameters. Consequently, in terms of image processing feature analysis, in this study, we employed deep learning image semantic segmentation methods to classify weld defects and extracted four parameter feature points of welds in laser curve images.

Design of image feature point extraction network based on CNN
The structural diagram of the coding-decoding-free image feature point extraction network (EDE-net), proposed for this study, is depicted in Fig 5. The input of this network is the preprocessed laser profile image of the weld, and the output of the network is the pixel position of the parameter feature point. The coding part of the network is composed of the CNN backbone. First, the CNN backbone output feature map outputs an n-dimensional feature map M after upsampling at the branch of the decoding part. Then, the location of the feature point, roughly extracted by the network, can be expressed as ðx i rough ; y i rough Þ ¼ argmax M i ðx; yÞ (i = 1,. . ., n). On branch two of the decoding part, the output feature map N with dimensions of 2n is sampled and processed, and feature point correction information Finally, the position of the parameter feature point in the input image, namely ðx i ; y i Þ ¼ ðx i rough þ x i cor ; y i rough þ y i cor Þ, can be obtained by combining the information of the two output feature maps M and N .
The EDE-net coding structure output feature map up-sampling methods include bilinear interpolation, deconvolution, and depooling methods. If the image up-sampling multiple is k upsample , then the input scale is ih×iw×3 image I, the up-sampling output scale is ih � k upsample ×iw � k upsample ×3 image O, and the pixel position (kp x , kp y ) in the image O pixel value O kp x ;kp y is sampled and mapped to the pixel position (p x , p y ) of the image I pixel value I p x ;p y . The up-sampled image O after bilinear interpolation is determined as follows: where p x ¼< kp x k upsample > þu, p y ¼< kp y k upsample > þv, and h�i are round-down calculations. Based on the formula, it is evident that the bilinear interpolation upsampling method should traverse each pixel, which features a large amount of calculation and slower running speed.
The deconvolution upsampling mechanism is shown in Fig 6. The adjacent and surrounding pixels in Image I were interpolated and filled with pixels (usually the pixel value was 0). To obtain the inverse convolution output image O, the convolution kernel and supplemented images were used for the convolution calculation. The pooling kernel was set according to the maximum and average pooling methods, and the de-pooling up-sampling mechanism was the same as the deconvolution up-sampling image supplement method. The up-sampling magnification of the deconvolution and de-pooling methods was lower, but the amount of calculation was less. In deconvolution, the convolution kernel can participate in the entire network training and update the weights, whereas the weights of the pooling kernel cannot be changed. Therefore, EDE-net uses the deconvolution calculation as the up-sampling method.
( The distances between the feature point and background within the threshold T can be treated as a binary classification task. Hence, the focal loss of the feature classification task can be utilized as the loss function as shown in the output feature map of the branch-network theory. The adjustment parameters α and γ are compatible with the feature classification task Focal-loss [33]. Therefore, the branch-loss function L M is as follows: where matrix S elements are all one. EDE-net branch 2 is a feature point position-correction task. The convolutional network's deep feature map is processed through a single layer with a stride of 2, scale of 3×3, and dimension of 2n deconvolution kernels. This yields a 2n-dimensional feature map N . The theoretical output results N of EDE-net network branch 2 are shown in Fig 8. Based on the approximate location of M feature points, the branch two-theory output feature map N adds a feature point location correction value. This implies that the position value of the corresponding element of N 2i-1 is the true value of M i , and N 2i is the pixel difference between the theoretical feature points and X-Y-axis of the image coordinate system at the element's position. Therefore, the theoretical outputs N 2i-1 and N 2i branch 2 are defined as follows: ( where c denotes the ratio of the input image scale to the output feature map scale of upsampling. The feature point position correction task is a numerical regression task. Therefore, the Huber loss is used to establish the loss function:

Image data enhancement method for pressure vessel weld surface profile based on third-order NURBS curve
A CNN requires a large number of datasets for training, and the data capacity of the training set is directly related to the CNN's ability to extract feature points. The conventional approach of producing training sets involves acquiring contour images with an active vision imaging device and manually labeling the location coordinates of the feature points in each image. To avoid this, in the current study, we provide a surface parameter simulation method with the coexistence of multiple defects on the weld surface of the pressure vessel. This in turn allows variation of the types of weld parameters, number of feature points, and parameter values, and thereby, effectively reduces the number of training set collection tasks. A simulation diagram of the normal weld curve is shown in Fig 9. The parent material area of the curve in the image coordinate system (F metal (x,y)) is expressed as follows: : where Set width denotes the weld width parameter, R stand denotes the diameter of the pressure vessel cylinder, L pic denotes the imaging area length, and W pic denotes the width. The simulation feature point position of the normal weld contour curve is presented in Table 1 when the simulation reinforcement is Set re .
A non-uniform rational B-spline (NURBS) curve [34] simulation weld zone curve was constructed based on the aforementioned three points: P re , P left width , and P right width . Assuming that the control point of the third-order NURBS curve is k = 3, d i = [x,y] T , w i is the weight factor of the curve control point, and w 0 , w 2 >0, the remaining w i �0, F weld (u) can be expressed as follows:

PLOS ONE
Pressure vessel-oriented visual inspection method based on deep learning Furthermore, d i information can be inversely calculated with three NURBS curve data points: P re , P left width , and P right width . Let the node vectorbe U ¼ ½u 0 ; u 1 ; u 2 ; � � � ; u nþ4 � when the curve is opened and the control point be n = m+k-1 = 5 such that the curve passes through the first and last control points. The node vector should exhibit k+1 repeatability, and it is set using the accumulation chord length method. This implies that u 0 = u 1 + u 2 = u 3 = 0, u 5 = u 6 + u 7 = u 8 = 1, jp i À p iÀ 1 j, and the setting of the weight factor w i of the curve control point are as follows: The beginning and end points of the curve control points were consistent with those of the data points throughout the reverse solution procedure. The connection points of the curved segment correspond to the nodes of the NURBS curve-defining domain. Therefore, if the data point q i corresponds to the node value u i+3 (i = 0,1,2), then the solution condition for d i is as follows: To complete the control-point solution provided by the tangent vector boundary conditions, the following two equations must be included: By combining the above equations, we can complete the coordinate position solution d i and construct curve F weld (u).
Defect parameter undercut simulation. If the undercut width, depth, and offset are δ un_cut , Set un_cut , and Δ un_cut , respectively, then they indicate that the undercut feature point and width feature point are separated in the X-axis direction. Then, the value range of x in the curve of F right metal ðx; yÞ of the parent metal area on the right is modified to x 2 ½L pic þ Set width À d un cut ; W pic �. The positions of the simulated feature points of the undercut weld profile curve with defect parameters are listed in Table 2.
To perform the single-defect undercut weld contour curve simulation, five-point coordinates P re , P left width , P right width , P un_cut , and P mis_align can be used as the NURBS curve data points as shown in Fig 10. Defect parameter misalignment simulation. In this case, the width feature points on both sides of the weld coincide with the deviation feature point, and curve F left metal ðx; yÞ of the base material area on the left is consistent with the simulation of the normal weld. The curve of the base material area on the right F right metal ðx; yÞ by Set mis_align along the Y-axis of the image coordinate system and F right metal ðx; yÞ on the right side of the weld is as follows: 8 < : The single-defect misaligned weld profile curve simulation was performed using the coordinates P re , P left width , and P right width . The three points were used as NURBS curve data points as shown in Fig 11.

Experiment on EDE-net performance of different backbone networks
Test CNN backbone selection. The accuracy of the overall network feature point extraction is affected by the feature extraction performance of the CNN backbone network in the image feature point extraction network based on encoding-decoding. Among the common CNN networks, including AlexNet [35], VGG [36], Res-Net [37], and Inception [38], AlexNet and VGG networks are linear and branchless structures. When the network layers are deeper, they are more difficult to develop, and common problems, such as gradient disappearance and explosion, can occur. To solve the aforementioned problems, Res-Net introduces the residual unit residual in the convolutional layer and realizes identity mapping by constructing direct connections. The network does not degrade as the depth of the network convolutional layer increases owing to the continued stacking. The inception network proposed a structure to obtain close feature extraction capabilities with fewer network layers. The feature map was produced using several convolutions and pooling kernels, and the results were stacked to P un_cut L pic +Set width −δ un_cut −Δ un_cut F right metal ðL pic þ Set width À d un cut Þ À Set un cut P mis_align L pic +Set width −δ un_cut F right metal ðL pic þ Set width À d un cut Þ https://doi.org/10.1371/journal.pone.0267743.t002  Table 3 lists the tested CNN network information, where Top1 accuracy is the CNN structure in the image net image classification result. This can be used as the performance level of the network. The trainable parameters indicate the complexity of the network training owing to the involvement of many parameters, such as the standard of performance, higher accuracy, and better network performance.
Selection of training sets. Based on the existence of the undercut defect parameter, the number of feature points of the weld parameters in images are 3, 5, and 7, namely, datasets D 3 , D 5 , and D 7 . To collect boiler butt type B and pressure pipeline type A welds with diameters of 1300 and 255 mm, respectively, a Keyence LJV-7080 sensor was used. The set of surface contour points comprised the training and test sets. Fig 12 shows the simulation-generated contour image effect and previously acquired contour maps.
To improve the data from the aforementioned photographs, an affine transformation was adopted. Let the image size be W D ×H D , image rotation angle be β, and image scaling size be h scale . Then, the affine matrix is determined as W warp as follows:   One hundred physical collection datasets D 3 , D 5 , D 7 , and 100 simulation-generated datasets D 3 , D 5 , and D 7 were selected, with a random rotation angle β = 0-30˚and random scaling size h scale = 0.5~0.8. Table 4 contains the training of hyperparameter information, and the network training loss function is used as the evaluation index to determine the ideal CNN structure. Fig 13(A)-13(C) show the trend charts of the network loss function extracted from different CNN structures based on the encoding-decoding depth feature points. In the figure, it can be observed that (a) the CNN network structure is more difficult to train without fine-tuning migration, resulting loss does not converge, and network model fails. As the number of training steps increased, the training difficulty of the CNN structure after fine-tuning dramatically decreased, and the ultimate convergence effect improved. (b) Network training is more effective when the layers of the same CNN structure are much deeper. (c) As the performance improves, the accuracy rate of the CNN structure increases. For instance, the Inception-Resnet network structure exhibited a Top1 accuracy of 80.4. The training loss function is reduced when compared to that of the ResNet series CNN.

Experiment on the actual pressure vessel weld of measurement results
The experiment was initiated by selecting 150 physical collection datasets and 60 simulation datasets D 3 , D 5 , and D 7 with a random rotation angle β = 0-30˚and random scaling size h scale = 0.5-0.8. Among the datasets, 90 physical collection datasets and 60 physical collection training sets with 60simulation-generated training sets were used to train DeepLabCut, HR-net, and EDE-net (Inception Res-net), respectively. Finally, the remaining 60 physical collection datasets D 3 , D 5, and D 7 were used as the test set, and the accuracy of the parameter feature point extraction was used as the network performance evaluation index.
All the parameter feature points were regarded as feature points of the same nature if the Euclidean distance between the artificially marked parameter feature point and network output feature point was used as the evaluation index. The parameter reinforcement and undercut  feature points are extreme-value natural points. The characteristic reflection is stronger when compared to the parameter width and natural point of the wrong edge inflection point. Given that the feature points of the weld parameters differ in marking and extraction difficulty, using the Euclidean distance as the evaluation index is no longer appropriate. Consequently, the object key-point similarity (OKS) weighted Euclidean distance was introduced as the characteristic evaluation index of the network output parameters. It is defined as follows: where S denotes the presence of an undercut in the weld image, d i S denotes the square of the Euclidean distance between the artificially marked parameter feature point and network output feature point, A i S denotes the pixel area of the weld curve in the image, s 2 i denotes the artificially marked feature point and real position deviation information of the feature point, and d i S =A i S is replaced by the mean square error in the numerical calculation. Currently, average precision (AP) is used to evaluate the same task in a deep network and performance evaluation index of different network structures. The feature points of the OKS and AP indicators contain AP evaluation methods. The correlation between the tasks is as follows: where T s denotes the OKS threshold. The network output feature points deviate from the actual feature points. The accuracy of the feature point extraction can be improved further if the network output feature points are returned to the laser curve using the following equation.
where (x out −y out ) denotes the coordinate of the characteristic point of the welded seam output by the network, (x Cor ,y Cor ) denotes the coordinate of the characteristic point of welding on the laser line after correction, and (x R −y R ) denotes the coordinate of any point on the contour line. The network results are presented in Table 5. As shown in Table 5, EDE-net (Inception-Resnet) before and after correction of AP, AP0.5, AP07 is better than Deep Lab-Cut,and HR-net network, and these corrections can significantly improve the feature point. The extraction accuracies were similar for the two training sets. However, the training method for actual measurements and simulations can effectively reduce the data-collection workload. Table 6 lists the absolute error and standard deviation data for all the parameter feature points. The measurement resolutions of the X-axis and Y-axis of the sensor were 0.005 and 0.001 mm, respectively. If we select a 99.73% confidence interval [μ-3σ, μ+3σ], then the confidence interval of the reinforcement feature point extraction error is [-10.958, 10.8629], and the theoretical measurement accuracy can be as high as 0.011 mm. Similarly, for other parameters, in which the measurement accuracies are mentioned, the confidence interval of the width feature point extraction error was [-9.182, 13.069], and the theoretical measurement accuracy was as high as 0.065 mm. The confidence interval of the extraction error of the undercut feature points was [-10.245, 8.885], and the theoretical measurement accuracy was as high as 0.011 mm. The confidence interval of the error of the feature point extraction of the amount of error was [-11.833, 11.833], and the theoretical measurement accuracy was as high as 0.012 mm.

Discussion
The measurement indicators for the surface parameters of a single weld specified in the relevant verification standards for pressure vessels cannot be effectively used in the measurement of an actual weld surface profile where multiple defects coexist. In this study, the appearance characteristics of the weld surface parameters were measured in the form of image feature points, and algorithm design ideas of the regression image from the feature point coordinates were proposed using the excellent nonlinear mapping ability of CNN networks. An image feature point extraction network based on deep learning was designed to simultaneously extract all the parameter feature points. For the network training measurement, a method based on the 3rd NURBS curve simulation of a realistic weld surface profile was proposed to enhance the training data. Finally, an experimental device was designed to collect the surface data of A and B butt welds, and the deep learning network proposed in this study was compared with the DeepLabCut and HR-net methods under different training sets. The results show that the difference between the training output of the training set network after data enhancement and training set network output AP, which is completely measured, is low. However, the data enhancement method can effectively reduce the workload of sample collection, and the theoretical accuracy of parameter measurement can be realized within 0.065 mm.